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1, INTRODUCTION 

Nowadays, steganography becomes the most critical approach used to secure the data. The word 
steganography means, hide the secret data like the text or digital format. It aims to conceal the secret data 
ultimately between two parties and make it not visual to the third party and without any suspicions about 
the existing of any hidden information. There are some types of steganography have been divided into two 
mediums, which are digital steganography and natural language steganography. Digital steganography 1s 
the art that deals with the digital medium, for example, image, video, and audio, while natural language 
steganography deals with the text files. Even though digital steganography has the main considerations 
by the researchers, however, the text is the most critical data that needs to be secured because most 
of the documentation will be in the text such as sending critical information or assigning urgent 
appointments [1]. Additionally, natural language steganography is classified into two types, which are 
linguistic steganography and text steganography. Linguistic steganography deals with text (a secret message) 
that will be hidden in a text medium [2]. Meanwhile, text steganography changing the format of the text 
or a specific character without changing the meaning of the sentences [3, 4]. Hiding the data by using natural 
language steganography needs some techniques. Each type has its techniques used by the researchers in text 
steganography, which are, word-rule based and feature-based technique [5]. Meanwhile, there are five 
techniques used in linguistic steganography such as, synonym substitution, syntactic substitution, semantic 
substitution, PCFG, and hybrid technique. 

Word-rule based known as the technique that embeds the secret message by shifting the text 
horizontally or vertically, and it is consist of two categories, which are line-shift coding and word-shift 
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coding [6]. Meanwhile, feature-based defined as the technique that changes the feature of the characters 
based on a code word. Sometimes it is slightly shifted up and down, or changing the length of the characters 
to embed the message in the text data [7]. Therefore, this paper presents a review on feature-based 
and word-rule based techniques by analyzing the advantages and drawbacks for each technique. Besides, 
the performance metric for text steganography technique will also be highlighted. 


2. PERFORMANCE OF TECHNIQUES IN TEXT STEGANOGRAPHY 

There are many techniques used in text steganography that contribute to increase the performance 
of embedding the hidden message. Thus, this section will study the performance of these techniques based 
on feature-based and word-rule based techniques. 


2.1. Feature-based technique 

The feature-based technique works with the shape or the format of the text, for example, changing 
the size or changing the font type. This technique can make reader assumes that there are no changes in 
the text, so that reader can’t recognize the secret message hidden in the cover text. Based on the current 
studies, there are two types of feature-based techniques which are language-based (it concentrates on 
the language used), and letter-based (it can be implemented in any language that used A-Z letters) [8]. 

Tables 1 and 2 have shown two techniques utilized in feature-based technique. By using 
language-based, the robustness of the text steganography algorithm will be increased. However, if there was 
an enhancement in the capacity performance, the security strength will be decreased. Since there is 
a relationship performance between the quality and the security of text steganography algorithms, 
the increment in the capacity performance will also affect the quality performance of the cover text, 
and it will be easy to be attacked. Meanwhile, by using the letter-based technique, the scheme will have 
a high capacity and a robustness performance scheme. 


Table 1. Review on feature-based technique used (language-based) 


Techniques Review 


used 
English-Based 


Arabic-Based 


Chinese-Based 


Proposed methods/technique 


Encrypt the text by DES then 
embed by counting the equivalent 
position for each secret message 
and cover text [9]. 

The characters for the secret 
message is converted to its binary, 
then it will replaced by the ASCII 
characters [10]. 

two-letter word: regarding to the 
location of two letters, the secret 
message will be embedded [11]. 
SEFS method: upper-case letter, 
punctuations and white space [12]. 


Diacritic will be presented if the 
secret bit is 1, else if the secret bit is 
0, the diacritic will be 
removed [13, 14]. 

Used the isolated letters to hide the 
secret message [15]. 

Looking for the not connected 
letters to hide the _ secret 
message [16]. 

Using HARAKAT to hide the text 
in the reverse Fatha [17]. 


Multi-keywords  carrier-free text 
steganography method based on 
part of speech tagging has been 
proposed [18]. 

Coverless plain text steganography 
based on characters’ features has 
been proposed to find the common 
features of the characters to be 
represented in binary (0 and 1) [19]. 


Advantages 


High security [9]. 
Robust and High 
capacity because of 
hiding Eight bits in 
one character [10]. 


High capacity and 
robust [11]. 
Better capacity, and 


robust [12]. 


Better capacity and the 
method is easy to be 
implemented [13, 14]. 


High capacity and 
robust [15]. 
Higher capacity 


than [15], because of 
high ratio for the 
characters, also. the 
proposed method has a 
robust scheme [16]. 
High capacity, 
robust [17]. 


High capacity [18, 19]. 


Disadvantages 


Low capacity because the 


interactivity is very 
simple [9]. 
Security should be 


increased [10, 11]. 


Weak robust [13, 14]. 
Losing the information in 
case of retyping [17]. 


By reviewing the 
previous researches, 
it was noticed that by 
increasing the 
capacity, the security 
will be decreased 
because the quality 
will be low. 


Security should be 
improved by applying 
encryption [18]. 

By generating a_ long 
sentences, there will be a 
poor readability (low 
security) [19]. 
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Table 2. Review on feature-based technique used (letter-based) 


Techniques used Proposed methods/technique Advantages Disadvantages Review 
SEFT [20] Changing the font for the High Capacity and Some fonts take 
selected character. It needs three _ robust [20]. large size when 
characters from the cover text to replace it with their 
hide one character from the similarity [20]. 


secret message. 


XORSTEG [21] Work with the binary of the High capacity [21]. 
character and XOR the first and 
the last word in the same 


paragraph. 
CALP, CURVE, Comparing between CAPL, CALP has the highest VERT has _ the oa ‘ease 
VERT [5] CURVE, and VERT techniques hiding capacity. Also, the lowest capacity. y e 
letter-based method. 
in term of capacity and _ higher size of the secret 
embedding time. text, the longer embedding 
time. 
Polish text Hiding “0” bit in the pointed Robust, secured, reliable. Capacity has not 
steganography letter and “1” bit in the un- been considered by 
[22] pointed letter. the author. 
Hypertext markup Using HTML tags and attributes High capacity. Time complexity 
[23] to hide the secret message. should be 


decreased. 


2.2. Word-rule based 

The implementation is based on hiding the characters horizontally or vertically on word-rule 
based techniques. It has two techniques which are line-shift coding (which conceals the secret message 
vertically) and word-shifting technique (which hides the secret message horizontally in the text). As shown 
in Table 3, the performance for embedding processing will be high as long as the capacity performance is 
low. Moreover, it is noticed that there is an inverse relationship performance between security 
and embedding capacity. 


Table 3. Review on word-rule based technique used 


Technique used Proposed methods/technique Advantages Disadvantages Review 
Line-shift code Converting the secret message into ASCH High Low performance 
[3] value then converted to binary. Convert the robustness and 


binary bit with the number of shifted rows. high capacity 
Random binary bit will be added if the binary 
bit is less than the number of shifted rows. 


Combination The secret message converted to ASCII code Highcapacity Weak robustness, 
(Line-shift + then to binary. Then, the bits are stored in because by deleting a 
word-shift) [24] | blocks of 2 bits to embed them into the over space, the hidden data The capacity ae 
file. will be damaged increased by reducing 
Word-shift Hide the secret message 0 in the unchanged Can be _ used = ee rie 
[25-27] space, and | bit in the changed space between —_ with PDF ; é 
; technique. 
the neighbors. documents 


Changing the document by shifting the High security Low capacity 
location of the word horizontally in the same 
line. 


Encrypt the message using AES then convert High security Time consuming and 
the text to binary, after that embed the bits low capacity 
into the white space. 


3. PERFORMANCE METRIC ON TEXT STEGANOGRAPHY 

The performance metrics for the text steganography are assisted in evaluating the performance 
in embedding processing, which are embedding time, capacity, imperceptibility, robustness, security, 
availability, confidentiality, and integrity. Table 3 has elaborated the performance metric used in text 
steganography that evaluates the embedding performance of text steganography techniques. Each metric has 
its usefulness that helps to ensure the quality of the technique. Moreover, Table 4 has mentioned 
the techniques used for text steganography by using different performance metrics. 
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Figure 1 has illustrated the percentage of the performance metrics of text steganography which are 
capacity, imperceptibility, robustness, security, availability, confidentiality, integrity, and embedding time. 
The main parameters evaluated were capacity at 29% used, followed by security at 24% used. Besides, 
robustness and embedding time are followed with 16% used and 14% used respectively. Meanwhile, 
the performance metrics such as confidentiality, availability, and integrity have the minimum percentage that 
used by researchers at 4%, 3% and 3% respectively. 


Table 4. Performance metric in text steganography used by researchers 


Parameter Metric Resource Purpose 
Embedding time [3, 5, 23, 26-32] Calculates the time of the technique that the embedding 
process consumes. 
Hiding capacity [3, 6, 11, 17, 18, 20, 22, 25, 27, 30, 33-43] It is the maximum number of bits that could be embedded 
in the cover medium. 
Imperceptibility [11, 17, 37, 44] Hiding the message in a way that the human vision cannot 
perceive the difference in the stego medium. 
Robustness [2, 3, 12, 15, 16, 22, 24, 38, 45-48] It measures how is the technique is difficult to be 
destroyed by the attacks or the manipulating processes. 
Security [2, 6, 8, 10, 11, 13-15, 18, 21-23, 26, 29, 40, 47-49] It protects the secret message from the internal and 
external attacks. 
Availability [18, 26] It is the ability of the authorized person to get access to 
the hidden data. 
Confidentiality [27, 34, 45] It is a measurement to ensure that the secret data will not 
be available to unauthorized people. 
Integrity [31, 34] It ensures that the data should not be altered by 


unauthorized people during the transmission. 
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Figure 1. Percentage of performance metric used 


4. DISSCUSSION AND CONCLUSION 

This paper has presented an analysis review of text steganography techniques. Text steganography 
has two main techniques, which are feature-based and word-rule based. Each technique has some enhanced 
technique used in order to improve the embedding performance for text steganography. It is focused that 
the main technique used among the researchers was feature-based at 76% due to its simplicity system 
in implementation and its security strength, as shown in Figure 2. Based on Figure 3, there are common 
parameter metrics used to evaluate text steganography, which are security performance at 34% used, capacity 
performance at 24% used, robustness performance at 23% used, and embedding time performance at 
19% used. 

As a conclusion, it is found that the most method used for text steganography is feature-based. It is 
because of its security and simplicity in text steganography implementation. As well, this paper has 
mentioned the performance metrics used in evaluating the feature-based techniques which are security, 
capacity, robustness, and embedding time. 
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